Lightweight and Accurate Silent Data Corruption Detection in Ordinary Di erential Equation Solvers

نویسندگان

  • Pierre-Louis Guhur
  • Hong Zhang
  • Tom Peterka
  • Emil Constantinescu
  • Franck Cappello
چکیده

Silent data corruptions (SDCs) are errors that corrupt the system or falsify results while remaining unnoticed by firmware or operating systems. In numerical integration solvers, SDCs that impact the accuracy of the solver are considered significant. Detecting SDCs in high-performance computing is necessary because results need to be trustworthy and the increase of the number and complexity of components in emerging large-scale architectures makes SDCs more likely to occur. Until recently, SDC detection methods consisted in replicating the processes of the execution or in using checksums (for example algorithm-based fault tolerance). Recently, new detection methods have been proposed relying on mathematical properties of numerical kernels or performing data analysis of the results modified by the application. None of those methods, however, provide a lightweight solution guaranteeing that all significant SDCs are detected. We propose a new method called Hot Rod as a solution to this problem. It checks and potentially corrects the data produced by numerical integration solvers. Our theoretical model shows that all significant SDCs can be detected. We present two detectors and conduct experiments on streamline integration from the WRF meteorology application. Compared with the algorithmic detection methods, the accuracy of our first detector is increased by 52% with a similar false detection rate. The second detector has a false detection rate one order of magnitude lower than these detection methods while improving the detection accuracy by 23%. The computational overhead is lower than 5% in both cases. The model has been developed for an explicit Runge-Kutta method, although it can be generalized to other solvers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lightweight and Accurate Silent Data Corruption Detection in Ordinary Differential Equation Solvers

Silent data corruptions (SDCs) are errors that corrupt the system or falsify results while remaining unnoticed by firmwares or operating systems. In numerical integration solvers, SDCs that impact the accuracy of the solver are considered significant. Detecting SDCs in high-performance computing is necessary because results need to be trustworthy and the increase of the number and complexity of...

متن کامل

Exploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications

Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extremescale systems. If not dealt with properly, SDC has the potential to influence important scientific results, leading scientists to wrong conclusions. In previous work, our detector was able to detect SDC in HPC applications to a certain level by using the peculiarities of t...

متن کامل

Discrete-time Solutions to the Continuous-time Differential Lyapunov Equation With Applications to Kalman Filtering, Report no. LiTH-ISY-R-3055

Prediction and ltering of continuous-time stochastic processes require a solver of a continuous-time di erential Lyapunov equation (cdle). Even though this can be recast into an ordinary di erential equation (ode), where standard solvers can be applied, the dominating approach in Kalman lter applications is to discretize the system and then apply the discrete-time di erence Lyapunov equation (d...

متن کامل

Rational Heuristics for Rational Solutions of Riccati Equations

We describe some new algorithm and heuristics for computing the polynomial and rational solutions of bounded degree of a class of ordinary di erential equations, which includes generalized Riccati equations. As a consequence, our methods can be used for factoring linear ordinary di erential equations. Since they generate systems of algebraic equations in at most n unknowns, where n is the order...

متن کامل

TECHNISCHE UNIVERSITÄT BERLIN Analysis and Reformulation of Linear Delay Di erential-Algebraic Equations

In this paper, we study general linear systems of delay di erential-algebraic equations (DDAEs) of arbitrary order. We show that under some consistency conditions, every linear high-order DAE can be reformulated as an underlying high-order ordinary di erential equation (ODE) and that every linear DDAE with single delay can be reformulated as a high-order delay di erential equation (DDE). We der...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016